Return to Homepage

Introduction

In the following report, I will investigate and examine which animal is the king of attacks whether on land or in the water, by comparing the attacks from sharks, wolves, and alligators. While this data has already been somewhat cleaned and contains fairly simple information, this final project aims to investigate the disparities between these three animals, the attack locations, and the effected population. In simpler terms, who is responsible for the most attacks? Which populations are the most effected? Where these attacks provoked or not? These are just some of the questions I hope to explore in this writing.

Background

This project was inspired in part by my family, my mother is from Florida and there have been some very interesting documentaries done on all these animals. It was also inspired by general happen stance: I found a very nice data set on Kaggle and felt it could be an interesting take on a seemingly dull data set. As mentioned, this data set was found:

Fatal Alligator Attacks

Shark Attacks

Shark Attacks By Hemispheres

Global Wolf Attacks

The data set consists of various reported statistics regarding these different attacks. Most individuals were attacked were predatory. However, some individuals were reported to provoke the animal. Different age groups were attacked each time.

Prevention would be great, but there is not signs for those that will and could be attacked. Alligators, sharks, and wolves have different reasons to attack.

Data

# load all necessary libraries 
library(tidyverse)
library(janitor)
library(leaflet)
library(dplyr)
library(ggplot2)
library(lubridate)
library(stringr)

# reading in kaggle data sets 
gator <- read.csv("predators/fatal_alligator_attacks_US.csv")

g_wolves <- read.csv("predators/global_wolves.csv")

shark_1 <- read.csv("predators/Shark_attacks/attacks.csv")

shark_2 <- read.csv("predators/shark_attacks.csv")

shark_3 <- read.csv("predators/Shark_attacks/list_coor_australia.csv")

This data set was retrieved from Kaggle and has already been somewhat cleaned for analysis. However, there are some changes I wanted to make to the data structure, changing missing values/empty spaces, cleaning the names, and improving characters found throughout the data set. The following code and outputs demonstrate the changes I’ve made to allow for smoother data analysis:

# convert column titles to snake_case 

gator <-clean_names(gator)

g_wolves <-clean_names(g_wolves)

shark_1 <- clean_names(shark_1)

shark_2 <-clean_names(shark_2)
# converts all N/A chr values to actual missing values and fills in empty spaces with missing values 

g_wolves$type_of_attack[is.na(g_wolves$type_of_attack) | g_wolves$type_of_attack == ""] <- "Unknown"

shark_2$type[is.na(shark_2$type) | shark_2$type == "Invalid"] <- "Unknown"

shark_1 <- shark_1[!is.na(shark_1$type), ] 
shark_1_subset <- shark_1[1:6302, ]
nrow(shark_1)
## [1] 25723
length(shark_1$type)
## [1] 25723
# Check which rows have NA or invalid values
table(is.na(shark_1$type))  
## 
## FALSE 
## 25723
table(shark_1$type == "Invalid")  
## 
## FALSE  TRUE 
## 25176   547
# Replace only NAs first, then Invalids
shark_1$type[is.na(shark_1$type)] <- "Unknown"
shark_1$type[shark_1$type == "Invalid"] <- "Unknown"


# converts data structure to more appropriate data types
gator <-gator %>% 
  mutate(location = str_extract(details, "(Miami|Florida|Georgia|Texas|Louisiana|South Carolina)"),
         location = ifelse(location == "Miami", "Florida", location))

Alligator

Alligators are beautiful and dangerous animals. This data only shows a small amount of deaths that have been caused by alligators. There was no information about croc, even though they do have a hand in many deaths over the years, since they invaded the South. This data shows that Florida has had the most number of attacks. I have learned over the years is keep an eye out whenever by water on the Southern coast, you never know what might snap you up.

## # A tibble: 7 × 3
## # Groups:   location [5]
##   location       sex        n
##   <chr>          <chr>  <int>
## 1 Florida        female    11
## 2 Florida        male      19
## 3 Georgia        female     1
## 4 Louisiana      male       2
## 5 South Carolina female     4
## 6 South Carolina male       1
## 7 Texas          male       3

Unexpected Shark Attacks

When I first started trying to create this plot, this was the result. Attacks happening above Greenland. I learned how wrong I was.

knitr::include_graphics("media/Unexpected_Attacks.png")

Sharks

If any of you have a unrealistic fear that there is something in the water and you will get attacked. You’re not alone look is happening in Australia. Zoom out and see what is happening.

colnames(shark_3) <- c("latitude", "longitude")
center_lat <- -25.2744
center_lon <- 133.7751
zoom_level <- 5  

map <- leaflet() %>%
  addTiles() %>%  
  setView(center_lon, center_lat, zoom = zoom_level)

for (i in 1:nrow(shark_3)) {
  map <- map %>% addMarkers(lng = shark_3$longitude[i], lat = shark_3$latitude[i])
}

map
knitr::include_graphics("media/Australia.png")

This is what the map looks like when zoomed out. Crazy, right?

More Attacks

shark_1$type[is.na(shark_1$type) | shark_1$type == ""] <- "Unknown"
shark_1$type[is.na(shark_1$type) | shark_1$type == "Invalid"] <- "Unknown"
shark_1_subset <- shark_1[1:6302, ]

sharks <-shark_1 %>% 
  group_by(country) %>% 
  ggplot(aes(x = type)) +
   geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot(sharks)

Does anyone have an idea what boatomg is? I think they meant boating, but messed up on their English. It is understandable to a degree, English is hard.

More Shark Attacks

shark_2$type[is.na(shark_2$type) | shark_2$type == "Invalid"] <- "Unknown"

shark_2 %>% 
  group_by(area) %>% 
  ggplot(aes(x = type)) +
   geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Wolves

g_wolves$type_of_attack[is.na(g_wolves$type_of_attack) | g_wolves$type_of_attack == ""] <- "Unknown"

wolves <- g_wolves %>% 
  group_by(type_of_attack) %>% 
  ggplot(aes(x = type_of_attack))  +
  geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = "Types of Attacks", y = "Victim Count")

wolves

More Wolves

When I was looking and trying to clean the data, there were many factors that was affecting it. The data was not pretty even after trying to clean it. They are even higher than data above, there was some data that represented 15 people that were killed by wolves, all 15 people were placed on the same line.

# Read the CSV file (assuming it's saved as 'global_wolves.csv')
data <- read.csv("predators/global_wolves.csv", stringsAsFactors = FALSE)

data <- clean_names(data)

# Use a regular expression to extract the country from the 'Location' column
data$country <- sub(".*,\\s*(.*)$", "\\1", data$location)

# Extract the country from location string
data$country <- str_extract(data$location, "[^,]+$")

# Trim any leading/trailing whitespace
data$country <- trimws(data$country)

# Replace blank box with NA
data <- data %>%
  mutate(type_of_attack = ifelse(type_of_attack == "", NA, type_of_attack))

w_attacks <-data %>% 
  ggplot(aes(x = country, fill = type_of_attack)) +
  geom_bar(position = "dodge") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  
  labs(title = "Attacks and Country",
       x = "Country",
       y = "Count of Attacks") +
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.y = element_text(size = 8))  
knitr::include_graphics("media/wolves_attacks.png")

King

The King of attacks is the shark. The data shows that sharks have attacked, look at the map of Australia again, every marker is where a shark attack has occurred. Also another reason why the shark data is king is due to a lack and missing data from both the wolves and gators. If there was data for Nile Crocs, there might be some competition.

knitr::include_graphics("media/shark.png")

---
title: "Final_Project: Who is King of Attacks?"
author: "Meaghan Barrett"
date: "2025-04-08"
output: 
    html_document:
        theme: paper
        highlight: tango
        toc: true
        toc_float:
            collapsed: true
        number_sections: false
        code_download: true
        df_print: kable
        code_folding: show
        mode: selfcontained
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, warning = FALSE, message = FALSE)
```

[Return to Homepage](../index.html)

# **Introduction**
In the following report, I will investigate and examine which animal is the king of attacks whether on land or in the water, by comparing the attacks from sharks, wolves, and alligators. While this data has already been somewhat cleaned and contains fairly simple information, this final project aims to investigate the disparities between these three animals, the attack locations, and the effected population. In simpler terms, who is responsible for the most attacks? Which populations are the most effected? Where these attacks provoked or not? These are just some of the questions I hope to explore in this writing. 

# **Background**
This project was inspired in part by my family, my mother is from Florida and there have been some very interesting documentaries done on all these animals. It was also inspired by general happen stance: I found a very nice data set on Kaggle and felt it could be an interesting take on a seemingly dull data set. As mentioned, this data set was found: 

[Fatal Alligator Attacks](https://www.kaggle.com/datasets/danela/fatal-alligator-attacks-us?resource=download&select=fatal_alligator_attacks_US.csv)

[Shark Attacks](https://www.kaggle.com/datasets/felipeesc/shark-attack-dataset?phase=FinishSSORegistration&returnUrl=%2Fdatasets%2Ffelipeesc%2Fshark-attack-dataset%2Fversions%2F1%3Fresource%3Ddownload&SSORegistrationToken=CfDJ8PHSCL9k9s1HuJ2cRFBFhuhgpxN0g_ATDITz_-cXVG-n5-S8PcAnZdgDXHbn7ud0iaVYLeYWkYnFTY6Nc4JFt1nyWAZsTuhR8vSPv3ok5TP4AtRRK9-IzGDqSzZKUGxMayKK5NKkdWgewUVYPMF1aJl4phPB4ObwXl2AK7698CE230yss9kgbAVKcZACBg00FmSPPkTsYGhlWu4z3VrezvZDXoLn2eYayI0784JDAnaa1L5KVsvpzolGTk9T8hn7uDtX29rwNRaQWy19BsV0KZ7TcfDfFpYvRD8rSMrq4yEul7-CRa2L1R5qWvxEOYMlGI-VFN87sgabOPrg_CJ6jcJaVo0CcsQ&DisplayName=Meaghan+Barrett)

[Shark Attacks By Hemispheres](https://www.kaggle.com/code/icecream4/shark-attacks-by-hemispheres/notebook) 

[Global Wolf Attacks](https://www.kaggle.com/datasets/danela/global-wolf-attacks?select=global_wolves.csv) 

The data set consists of various reported statistics regarding these different attacks. Most individuals were attacked were predatory. However, some individuals were reported to provoke the animal. Different age groups were attacked each time. 

Prevention would be great, but there is not signs for those that will and could be attacked. Alligators, sharks, and wolves have different reasons to attack.  

# **Data**
```{r, echo = TRUE}
# load all necessary libraries 
library(tidyverse)
library(janitor)
library(leaflet)
library(dplyr)
library(ggplot2)
library(lubridate)
library(stringr)

# reading in kaggle data sets 
gator <- read.csv("predators/fatal_alligator_attacks_US.csv")

g_wolves <- read.csv("predators/global_wolves.csv")

shark_1 <- read.csv("predators/Shark_attacks/attacks.csv")

shark_2 <- read.csv("predators/shark_attacks.csv")

shark_3 <- read.csv("predators/Shark_attacks/list_coor_australia.csv")

```
This data set was retrieved from Kaggle and has already been **somewhat** cleaned for analysis. However, there are some changes I wanted to make to the data structure, changing missing values/empty spaces, cleaning the names, and improving characters found throughout the data set. The following code and outputs demonstrate the changes I've made to allow for smoother data analysis: 

```{r, echo=TRUE}

# convert column titles to snake_case 

gator <-clean_names(gator)

g_wolves <-clean_names(g_wolves)

shark_1 <- clean_names(shark_1)

shark_2 <-clean_names(shark_2)
```

```{r eval = TRUE}
# converts all N/A chr values to actual missing values and fills in empty spaces with missing values 

g_wolves$type_of_attack[is.na(g_wolves$type_of_attack) | g_wolves$type_of_attack == ""] <- "Unknown"

shark_2$type[is.na(shark_2$type) | shark_2$type == "Invalid"] <- "Unknown"

shark_1 <- shark_1[!is.na(shark_1$type), ] 
shark_1_subset <- shark_1[1:6302, ]
nrow(shark_1)
length(shark_1$type)

# Check which rows have NA or invalid values
table(is.na(shark_1$type))  
table(shark_1$type == "Invalid")  

# Replace only NAs first, then Invalids
shark_1$type[is.na(shark_1$type)] <- "Unknown"
shark_1$type[shark_1$type == "Invalid"] <- "Unknown"


# converts data structure to more appropriate data types
gator <-gator %>% 
  mutate(location = str_extract(details, "(Miami|Florida|Georgia|Texas|Louisiana|South Carolina)"),
         location = ifelse(location == "Miami", "Florida", location))

```

# **Alligator**

Alligators are beautiful and dangerous animals. This data only shows a small amount of deaths that have been caused by alligators. There was no information about croc, even though they do have a hand in many deaths over the years, since they invaded the South. This data shows that Florida has had the most number of attacks. I have learned over the years is keep an eye out whenever by water on the Southern coast, you never know what might snap you up. 

```{r, echo = FALSE}
gator %>%
  mutate(date = as.Date(date, format = "%B %d, %Y")) %>%  
  filter(age != "?") %>%
  mutate(age = as.numeric(age)) %>%
  filter(age >= 2 & age <= 81) %>%
  mutate(year = as.integer(format(date, "%Y"))) %>%  
  arrange(age) %>%
  group_by(location) %>%
  ggplot(aes(x = factor(year), y = age, color = location)) + 
  geom_point() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = "Year", y = "Age")
```


```{r, echo=FALSE}
deaths <-gator %>%
  filter(location != "?") %>%  
  filter(sex != "?") %>%       
  group_by(location, sex) %>%
  tally()

print(deaths)

victims_by_state <-deaths %>% 
  ggplot(aes(x = location, y = n, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge") +  
  geom_text(aes(label = n), position = position_dodge(width = 0.9), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  labs(title = "Alligator Attacks by State",
       x = "State", y = "Number of Victims", fill = "Sex")

victims_by_state
```

# **Unexpected Shark Attacks**
When I first started trying to create this plot, this was the result. Attacks happening above Greenland. I learned how wrong I was. 
```{r, out.width="70%", out.height="70%"}
knitr::include_graphics("media/Unexpected_Attacks.png")
```

# **Sharks**
If any of you have a unrealistic fear that there is something in the water and you will get attacked. You're not alone look is happening in Australia. Zoom out and see what is happening.

```{r, echo = TRUE}

colnames(shark_3) <- c("latitude", "longitude")
center_lat <- -25.2744
center_lon <- 133.7751
zoom_level <- 5  

map <- leaflet() %>%
  addTiles() %>%  
  setView(center_lon, center_lat, zoom = zoom_level)

for (i in 1:nrow(shark_3)) {
  map <- map %>% addMarkers(lng = shark_3$longitude[i], lat = shark_3$latitude[i])
}

map

```


```{r, out.width="70%", out.height="70%"}
knitr::include_graphics("media/Australia.png")
```


This is what the map looks like when zoomed out. Crazy, right?


#  **More Attacks**
```{r, echo=TRUE}

shark_1$type[is.na(shark_1$type) | shark_1$type == ""] <- "Unknown"
shark_1$type[is.na(shark_1$type) | shark_1$type == "Invalid"] <- "Unknown"
shark_1_subset <- shark_1[1:6302, ]

sharks <-shark_1 %>% 
  group_by(country) %>% 
  ggplot(aes(x = type)) +
   geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot(sharks)
```


Does anyone have an idea what boatomg is? I think they meant boating, but messed up on their English. It is understandable to a degree, English is hard. 


# **More Shark Attacks**
```{r, echo = TRUE}

shark_2$type[is.na(shark_2$type) | shark_2$type == "Invalid"] <- "Unknown"

shark_2 %>% 
  group_by(area) %>% 
  ggplot(aes(x = type)) +
   geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

```


#  **Wolves**

```{r, echo=TRUE}
g_wolves$type_of_attack[is.na(g_wolves$type_of_attack) | g_wolves$type_of_attack == ""] <- "Unknown"

wolves <- g_wolves %>% 
  group_by(type_of_attack) %>% 
  ggplot(aes(x = type_of_attack))  +
  geom_bar(aes(y = ..count..)) + 
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = "Types of Attacks", y = "Victim Count")

wolves
```

# **More Wolves** 
When I was looking and trying to clean the data, there were many factors that was affecting it. The data was not pretty even after trying to clean it. They are even higher than data above, there was some data that represented 15 people that were killed by wolves, all 15 people were placed on the same line. 

```{r, echo=TRUE}
# Read the CSV file (assuming it's saved as 'global_wolves.csv')
data <- read.csv("predators/global_wolves.csv", stringsAsFactors = FALSE)

data <- clean_names(data)

# Use a regular expression to extract the country from the 'Location' column
data$country <- sub(".*,\\s*(.*)$", "\\1", data$location)

# Extract the country from location string
data$country <- str_extract(data$location, "[^,]+$")

# Trim any leading/trailing whitespace
data$country <- trimws(data$country)

# Replace blank box with NA
data <- data %>%
  mutate(type_of_attack = ifelse(type_of_attack == "", NA, type_of_attack))

w_attacks <-data %>% 
  ggplot(aes(x = country, fill = type_of_attack)) +
  geom_bar(position = "dodge") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  
  labs(title = "Attacks and Country",
       x = "Country",
       y = "Count of Attacks") +
  theme_minimal()+
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.y = element_text(size = 8))  
```

```{r, out.width="70%", out.height="70%"}
knitr::include_graphics("media/wolves_attacks.png")
```

# **King**

The King of attacks is the shark. The data shows that sharks have attacked, look at the map of Australia again, every marker is where a shark attack has occurred. Also another reason why the shark data is king is due to a lack and missing data from both the wolves and gators. If there was data for Nile Crocs, there might be some competition. 


```{r, out.width="70%", out.height="70%"}
knitr::include_graphics("media/shark.png")
```


